W-kmeans: Clustering News Articles Using WordNet

نویسندگان

  • Christos Bouras
  • Vassilis Tsogkas
چکیده

Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed suffering however from problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters. We are proposing the enhancement of standard kmeans algorithm using the external knowledge from WordNet hypernyms in a twofold manner: enriching the “bag of words” used prior to the clustering process and assisting the label generation procedure following it. Our experimentation revealed a significant improvement over standard kmeans for a corpus of news articles derived from major news portals. Moreover, the cluster labeling process generates useful and of high quality cluster tags.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem

Collaborative filtering systems typically need to acquire some data about the new user in order to start making personalized suggestions, a situation commonly referred to as the ‘‘new user problem’’. In this work we attempt to address the new user problem via a unique personalized strategy for prompting the user with articles to rate. Our approach makes use of hypernyms extracted from the WordN...

متن کامل

A clustering technique for news articles using WordNet

Please cite this article in press as: C. Bouras, V dx.doi.org/10.1016/j.knosys.2012.06.015 The Web is overcrowded with news articles, an overwhelming information source both with its amount and diversity. Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed which, however,...

متن کامل

Enhancing News Articles Clustering using Word N-Grams

In this work we explore the possible enhancement of the document clustering results, and in particular clustering of news articles from the web, when using word-based n-grams during the keyword extraction phase. We present and evaluate a weighting approach that combines clustering of news articles derived from the web using n-grams, extracted from the articles at an offline stage. We compared t...

متن کامل

Improving news articles recommendations via user clustering

Although commonly only item clustering is suggested by Web mining techniques for news articles recommendation systems, one of the various tasks of personalized recommendation is categorization of Web users. With the rapid explosion of online news articles, predicting user-browsing behavior using collaborative filtering (CF) techniques has gained much attention in the web personalization area. H...

متن کامل

User Personalization via W-kmeans

With the rapid explosion of online news articles, predicting userbrowsing behavior using collaborative filtering techniques has gained much attention in the web personalization area. However, common collaborative filtering techniques suffer from low accuracy and performance. This research proposes a new personalized recommendation approach that integrates user and text clustering based on our d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010